55 research outputs found

    A Study of Deep Learning Robustness Against Computation Failures

    Full text link
    For many types of integrated circuits, accepting larger failure rates in computations can be used to improve energy efficiency. We study the performance of faulty implementations of certain deep neural networks based on pessimistic and optimistic models of the effect of hardware faults. After identifying the impact of hyperparameters such as the number of layers on robustness, we study the ability of the network to compensate for computational failures through an increase of the network size. We show that some networks can achieve equivalent performance under faulty implementations, and quantify the required increase in computational complexity

    Modeling and Energy Optimization of LDPC Decoder Circuits with Timing Violations

    Full text link
    This paper proposes a "quasi-synchronous" design approach for signal processing circuits, in which timing violations are permitted, but without the need for a hardware compensation mechanism. The case of a low-density parity-check (LDPC) decoder is studied, and a method for accurately modeling the effect of timing violations at a high level of abstraction is presented. The error-correction performance of code ensembles is then evaluated using density evolution while taking into account the effect of timing faults. Following this, several quasi-synchronous LDPC decoder circuits based on the offset min-sum algorithm are optimized, providing a 23%-40% reduction in energy consumption or energy-delay product, while achieving the same performance and occupying the same area as conventional synchronous circuits.Comment: To appear in IEEE Transactions on Communication

    Layerwise Noise Maximisation to Train Low-Energy Deep Neural Networks

    Full text link
    Deep neural networks (DNNs) depend on the storage of a large number of parameters, which consumes an important portion of the energy used during inference. This paper considers the case where the energy usage of memory elements can be reduced at the cost of reduced reliability. A training algorithm is proposed to optimize the reliability of the storage separately for each layer of the network, while incurring a negligible complexity overhead compared to a conventional stochastic gradient descent training. For an exponential energy-reliability model, the proposed training approach can decrease the memory energy consumption of a DNN with binary parameters by 3.3×\times at isoaccuracy, compared to a reliable implementation.Comment: To be presented at AICAS 202

    Sharpness-Aware Training for Accurate Inference on Noisy DNN Accelerators

    Full text link
    Energy-efficient deep neural network (DNN) accelerators are prone to non-idealities that degrade DNN performance at inference time. To mitigate such degradation, existing methods typically add perturbations to the DNN weights during training to simulate inference on noisy hardware. However, this often requires knowledge about the target hardware and leads to a trade-off between DNN performance and robustness, decreasing the former to increase the latter. In this work, we show that applying sharpness-aware training by optimizing for both the loss value and the loss sharpness significantly improves robustness to noisy hardware at inference time while also increasing DNN performance. We further motivate our results by showing a high correlation between loss sharpness and model robustness. We show superior performance compared to injecting noise during training and aggressive weight clipping on multiple architectures, optimizers, datasets, and training regimes without relying on any assumptions about the target hardware. This is observed on a generic noise model as well as on accurate noise simulations from real hardware.Comment: Preprin

    VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing

    Full text link
    The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention: many applications in fact require high-speed operations that suit a hardware implementation. However, numerous elements and complex interconnections are usually required, leading to a large area occupation and copious power consumption. Stochastic computing has shown promising results for low-power area-efficient hardware implementations, even though existing stochastic algorithms require long streams that cause long latencies. In this paper, we propose an integer form of stochastic computation and introduce some elementary circuits. We then propose an efficient implementation of a DNN based on integral stochastic computing. The proposed architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62% average reductions in area and latency compared to the best reported architecture in literature. We also synthesize the circuits in a 65 nm CMOS technology and we show that the proposed integral stochastic architecture results in up to 21% reduction in energy consumption compared to the binary radix implementation at the same misclassification rate. Due to fault-tolerant nature of stochastic architectures, we also consider a quasi-synchronous implementation which yields 33% reduction in energy consumption w.r.t. the binary radix implementation without any compromise on performance.Comment: 11 pages, 12 figure

    Learning Energy-Efficient Hardware Configurations for Massive MIMO Beamforming

    Full text link
    Hybrid beamforming (HBF) and antenna selection are promising techniques for improving the energy efficiency~(EE) of massive multiple-input multiple-output~(mMIMO) systems. However, the transmitter architecture may contain several parameters that need to be optimized, such as the power allocated to the antennas and the connections between the antennas and the radio frequency chains. Therefore, finding the optimal transmitter architecture requires solving a non-convex mixed integer problem in a large search space. In this paper, we consider the problem of maximizing the EE of fully digital precoder~(FDP) and hybrid beamforming~(HBF) transmitters. First, we propose an energy model for different beamforming structures. Then, based on the proposed energy model, we develop an unsupervised deep learning method to maximize the EE by designing the transmitter configuration for FDP and HBF. The proposed deep neural networks can provide different trade-offs between spectral efficiency and energy consumption while adapting to different numbers of active users. Finally, to ensure that the proposed method can be implemented in practice, we investigate the ability of the model to be trained exclusively using imperfect channel state information~(CSI), both for the input to the deep learning model and for the calculation of the loss function. Simulation results show that the proposed solutions can outperform conventional methods in terms of EE while being trained with imperfect CSI. Furthermore, we show that the proposed solutions are less complex and more robust to noise than conventional methods.Comment: This preprint comprises 15 pages and features 15 figures. Copyright may be transferred without notic

    Relaxed Half-Stochastic Belief Propagation

    Full text link
    Low-density parity-check codes are attractive for high throughput applications because of their low decoding complexity per bit, but also because all the codeword bits can be decoded in parallel. However, achieving this in a circuit implementation is complicated by the number of wires required to exchange messages between processing nodes. Decoding algorithms that exchange binary messages are interesting for fully-parallel implementations because they can reduce the number and the length of the wires, and increase logic density. This paper introduces the Relaxed Half-Stochastic (RHS) decoding algorithm, a binary message belief propagation (BP) algorithm that achieves a coding gain comparable to the best known BP algorithms that use real-valued messages. We derive the RHS algorithm by starting from the well-known Sum-Product algorithm, and then derive a low-complexity version suitable for circuit implementation. We present extensive simulation results on two standardized codes having different rates and constructions, including low bit error rate results. These simulations show that RHS can be an advantageous replacement for the existing state-of-the-art decoding algorithms when targeting fully-parallel implementations

    RSSI-Based Hybrid Beamforming Design with Deep Learning

    Get PDF
    Hybrid beamforming is a promising technology for 5G millimetre-wave communications. However, its implementation is challenging in practical multiple-input multiple-output (MIMO) systems because non-convex optimization problems have to be solved, introducing additional latency and energy consumption. In addition, the channel-state information (CSI) must be either estimated from pilot signals or fed back through dedicated channels, introducing a large signaling overhead. In this paper, a hybrid precoder is designed based only on received signal strength indicator (RSSI) feedback from each user. A deep learning method is proposed to perform the associated optimization with reasonable complexity. Results demonstrate that the obtained sum-rates are very close to the ones obtained with full-CSI optimal but complex solutions. Finally, the proposed solution allows to greatly increase the spectral efficiency of the system when compared to existing techniques, as minimal CSI feedback is required.Comment: Published in IEEE-ICC202

    A relaxed half-stochastic decoding algorithm for LDPC codes

    No full text
    When considering error-correction codes for applications, the most important aspect of a coding scheme becomes the ratio of error-correction performance versus cost. This work studies the decoding of LDPC codes, and presents a new iterative decoding algorithm that represents likelihood as binary stochastic streams, but uses some elements of the sum-product algorithm in its variable node. To convert the stochastic streams to a log-likelihood ratio representation, the algorithm uses the principle of successive relaxation. Because likelihood is represented as stochastic streams, processing nodes only exchange 1 bit messages, which results in a low-complexity interleaver. Simulations show that the proposed algorithm achieves excellent error-correction performance and can outperform the floating-point sum-product algorithm.We also study the problem of error floors in LDPC codes, and the manner in which it can be addressed at the level of the decoder. After reviewing existing solutions, an alternative technique for lowering error floors is presented. The technique, called redecoding, relies on the randomized progress of a decoding algorithm to successfully decode problematic frames. Simulation of one code shows that redecoding removes the floor at least down to a bit error rate of 10^-12.Lorsque des codes de correction d'erreur sont utilisĂ©s dans des applications, le ratio entre la performance de correction d'erreur et le coĂ»t devient l'aspect le plus important du systĂšme. La prĂ©sente thĂšse se penche sur le dĂ©codage des codes LDPC, et prĂ©sente un nouvel algorithme itĂ©ratif de dĂ©codage qui utilise des chaĂźnes stochastiques binaires pour reprĂ©senter les probabilitĂ©s tout en incorporant des Ă©lĂ©ments de l'algorithme somme-produit dans les noeuds de variables. Le principe de relaxation sĂ©quentielle est utilisĂ© pour convertir les chaĂźnes stochastiques `a des valeurs discrĂštes "LLR". Du fait de la reprĂ©sentation des probabilitĂ©s sous forme de chaĂźnes stochastiques, les messages Ă©changĂ©s entre les noeuds ont un seul bit, ce qui permet de diminuer la complexitĂ© de la distribution des messages. Les simulations dĂ©montrent que l'algorithme proposĂ© atteint une excellente performance de correction d'erreur, et qu'il peut surpasser l'algorithme somme-produit en virgule flottante. La thĂšse Ă©tudie Ă©galement le problĂšme des planchers d'erreur reliĂ© aux codes LDPC et se penche sur des solutions potentielles au niveau de l'algorithme de dĂ©codage. AprĂšs avoir passĂ© en revue certaines solutions existantes, une mĂ©thode alternative qui permet d'abaisser le plancher d'erreur est prĂ©sentĂ©e. La mĂ©thode, baptisĂ©e redĂ©codage, tire avantage de la progression alĂ©atoire de l'algorithme de dĂ©codage. Les simulations effectuĂ©es sur un code dĂ©montrent que le redĂ©codage Ă©limine le plancher d'erreur au moins jusqu'Ă  un taux d'erreur par bit de 10^−12
    • 

    corecore